Random phenomena are all around us: outcome of a coin flip, weather patterns, stock market, getting disease
Random means non-deterministic, i.e. phenomena where there is an associated uncertainty about the outcome; we cannot predict perfectly (next day weather, roll of a die, price of Apple stock)
Inability to predict perfectly could be due to lack of full information (e.g. flipping coin) or intrinsic (quantum mechanics)
Error in measurements: anything we measure has uncertainty associated with it
Randomness does not imply complete lack of pattern (e.g. a multiple coin toss will have about half heads and half tails), meteorologists can assess (and quantify!) whether the chance of rain is high or low
Probability is the branch of mathematics that allows us to model randomness and studies properties of random phenomena
Probability has application in computer science, machine learning, artificial intelligence, finance, statistics
Probability allows as to model observations/data arising from phenomena/processes that are or can be modeled as random
Statistics can be thought of as providing solutions to the inverse problem of probability: given observations/data infer properties about the underlying random phenomenon/process that generated the data
Probability can be defined formally (at different levels of mathematical rigor) or can be treated more intuitively
Definition: the set of all possible outcomes of an experiment of interest
Examples:
Single coin tossing:
Month of birth of a randomly chosen person:
Whether a Youtube video will be clicked when presented to a potential viewer
Event Space: all possible events (collection of outcomes) we will consider.
For discrete sample spaces event space is typically all possible subsets of
Example: single coin toss
Sample space:
Event space:
Example: birth month of a randomly chosen person
Q: how many elements in
We require that union of events and intersection of events are also events:
E.g.
Venn Diagrams:

E.g.
Example: Let
(prove DeMorgan's Laws to practice with set operations)
A probability function is a 'set function' that assigns a real number to each event in
2.
The probability reflects the chances an event occurs, 0 being impossible and 1 being certain
Or perhaps a more reasonable assignment of probabilities would be proportional to the number of days in each month:
(This shows that it is us, the users who assign probabilities; probabilities are not 'laws of nature')
Any probability in a discrete sample space can be constructed like in the previous examples:
is a probability function.
Exercise: show this
Property 3 holds for any number of disjoint events:
E.g. the sets:
Are pairwise disjoint
In general:
Provided the events are pairwise disjoint:
If
(Good practice exercise to show these)
E.g. Flip a coin twice:
E.g. Flip a coin
E.g. Flip a coin and then pick a month at random:
Q: How many elements in
If we have a probability function
E.g.
(This is how we model independence, which will cover next week)
In many applications it makes sense to assign the same probability to all elements of a finite sample space
E.g. two coin flip:
In general, a uniform probability space with
And the probability of an event is the number of elements in the event divided by the total number of elements in the sample space:
E.g. Pick a single card from a well shuffled standard 52-card deck:
Many problems in probability theory require that we count the number of ways that a particular event can occur. This kind of counting falls under the area of mathematics called combinatorics.
The Multiplicative counting principle (MP).
Suppose that we perform
possible outcomes for the sequence of
Example 1: Need to choose a password for an online account. Password must consist of two lowercase letters (a to z) followed by one capital letter (A to Z) followed by four digits (0,1, \dots, 9).
Example 2: How many subsets does a set with
How many five-card hands are possible from a standard fifty-two card deck? (if order matters)
In general, a
The number of
How many five-card hands are possible from a standard fifty-two card deck? (if order does not matter)
Each arrangement was counted
In general, a
The number of
Suppose we deal a 5-card hand from a regular 52-card deck. Which is larger, P(One king) or P(Two hearts)?
xxxxxxxxxx4 * choose (48, 4) / choose (52, 5)## [1] 0.2994736choose(13, 2) * choose (39, 3) / choose (52, 5)## [1] 0.2742797Dice game that played an important role in the historical development of probability.
Chevalier de Méré had been betting that, in four rolls of a die, at least one six would turn up.
He was winning consistently and, to get more people to play, he changed the game to bet that, in 24 rolls of two dice, a pair of sixes would turn up.
De Méré lost with 24 and felt that 25 rolls were necessary to make the game favorable.
Was De Méré right?
Single die roll in R:
xxxxxxxxxxsample (1:6, size = 1, replace = TRUE)[1] 2Four rolls of one die
xxxxxxxxxxsample (1:6, size = 4, replace = TRUE)[1] 1 2 4 3Checking if a six came up
xxxxxxxxxxany(sample(1:6, size = 4, replace = TRUE) == 6)## [1] FALSEFull simulation:
xxxxxxxxxxnreps = 1000set.seed (2021)results = numeric(0)for (i in 1:nreps) results[i] = any (sample(1:6, size = 4, replace = TRUE) == 6)mean(results)
## [1] 0.507Questions:
Based on this simulation result, do you think the bet's favorable?
Derive/compute the actual probability (hint: use that all outcomes of the four rolls of a die are equally likely)
Simulate the second scheme (24 rolls of two dice). What can you say about the favorability of the bet?
Derive/compute the actual probability. How about for 25 rolls of two dice?
How many people
Assume all 365 b-days are equally likely.
Perform a simulation in R to answer this question (hint: use the base R function 'duplicate' to check that whether there are matching b-days)
Compute the probability by mathematical derivation and plot the probability as a function of
E.g. flip a coin until first heads appears:
What is the right probability space for this experiment?
Sample space has to be infinite because no guarantee experiment will terminate in a finite number of steps!
If we assume that after
Does this result in a probability function?
For infinite sample spaces need to change additivity rule to countably additivity rule:
2'.
Verification of
(Used that for a geometric series:
)
Q: What's the probability that it'll take an even number of tosses until the first heads?
A set is finite if its elements can be put in one-to-one correspondence with with:
E.g., the set of students in the classroom, the set of inhabitants in the world, the set of stars in the Milky Way.
A set is countable if its elements can be put in one-to-one correspondence with with the natural numbers:
E.g. The set of natural numbers, the set of odd numbers
Examples of infinite non-countable sets:
Sample spaces (finite)
Probability functions on discrete spaces
Uniform probability spaces
Multiplicative counting principle
Permutations/combinations
E.g. Flip a coin twice:
E.g. Flip a coin
E.g. Flip a coin and then pick a month at random:
Q: How many elements in
If we have a probability function
E.g.
This is how we model independence.
E.g. number of coin flips until first heads appears:
What is the right probability space for this experiment?
Sample space has to be infinite because no guarantee experiment will terminate in a finite number of steps!
If we assume that after
Does this result in a proper probability function?
For infinite sample spaces need to change additivity rule to countably additivity rule:
Verification of
(Used that for a geometric series:
By extension of the rule for finite sample spaces, the probability defined above is a proper probability function.
Example: You roll a fair 6-faced die. Let
We can write:
If
Fraction of the probability
Tells us how to update probability in the presence of new information
Example:
What is the probability that two cards drawn at random from a deck of playing cards will both be aces?
What is the probability that two cards drawn at random from a deck of playing cards will both be aces if after dealing the first card it is an Ace?
From the definition we get the properties:
Multiplication rule:
Bayes rule:
###
Example: There are approximately 2.6 physicians per 1,000 people in the US (from world public health data by country)
Probability of choosing a physician if randomly choose a US inhabitant =
For fixed
If the sample space can be partitioned as
(holds even for a countable partition)
In particular, for any event
The probability of infection from a certain virus upon exposure is 10% for children age < 13, 5% for ages 13-60, and 15% for ages 60+. What is the probability that a random individual is infected upon exposure in a population where
Let
A diagnostic test has 99% sensitivity and 98% specificity.
If the population prevalence of the disease is 3%, what is the probability that an individual who tests positive is affected with the disease?
You're on a game show, and you're given the choice of three doors:
Behind one door is a car
Behind the others, goats
You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat.
He then says to you, "Do you want to switch to door No. 2 or keep prize behind door No. 1?"
Should you switch? Answer: Yes! Switching gives you 2/3 probability of winning

Two events
E.g. fair coin tossed twice:
The events
(in fact any first toss outcome is independent of any second toss outcome)
Independence between
And also the equation above holds replacing any number of the
Pairwise independence does not imply independence!!
Example: Two tosses of a fair coin
These are pairwise independent but not mutually independent
Countable sample spaces (e.g. flipping a coin until first head,
Conditional Probability:
Law of total probability:
Bayes theorem:
Independence:
For
Q: Is disjoint the same as independent? No! Disjoint events cannot both occur, while independent events don't affect each other's probabilities.
A discrete random variable is a function
E.g. The number in the upper face of the rolled die, the sum of two dice
Notation:
The pmf of a discrete random variable taking values
If
E.g. Fair coin flip:
The cdf of a discrete random variable taking values
If
E.g. Fair coin flip:
Both the pmf and the cdf completely characterize all the probabilistic information about a random variable (two random variables can have the same pmf and cdf and be different).

A random variable has a Bernoulli distribution
Models experiments with only two possible outcomes
E.g. coin toss (H vs. T), die comes up six (yes vs. no)
A random variable with a Bernoulli distribution is called a Bernoulli trial
Models the number of successes in
For
Suppose it is known that 5% of adults who take a certain medication experience negative side effects. What is the probability that more than
pmf, cdf, and Random generation of a binomial random variable
xxxxxxxxxx# pmfdbinom(3, size=10, prob=0.3)## [1] 0.2668279
# cdfpbinom(3, size=10, prob=0.3)## [1] 0.6496107
# Random generationrbinom(n=1, size=10, prob=0.3)## [1] 2
rbinom(n=3, size=10, prob=0.3)## [1] 0 1 3xxxxxxxxxx1 - pbinom(1, size = 100, prob = 0.05)## [1] 0.9629188
pbinom(1, size = 100, prob = 0.05, lower.tail = FALSE)## [1] 0.9629188
pbinom(5, size = 100, prob = 0.05, lower.tail = FALSE)## [1] 0.3840009
pbinom(15, size = 100, prob = 0.05, lower.tail = FALSE)## [1] 3.705408e-05xxxxxxxxxxpar(mar=c(6,8,5,1))
plot(0:5, dbinom(0:5, size=5, prob=0.4), col='red4', type='p', pch=16, cex=1.3, xlab='x', ylab='p(x)', cex.lab=2, cex.axis=2)
xxxxxxxxxxpar(mar=c(6,8,5,1))
plot(stepfun(0:5, c(0, pbinom(0:5, size=5, prob=0.4))), pch = 1, lwd=2, col='steelblue', xlab='x', ylab='F(x)', cex.lab=2, cex.axis=2, main='', verticals = F)
Generating
xxxxxxxxxxBernoulli_trials_1 = sample(0:1, 10, replace = TRUE, prob=c(0.3, 0.7))Bernoulli_trials_1
## [1] 1 1 0 1 1 0 1 0 1 0
sum(Bernoulli_trials_1)
## [1] 6Generating
xxxxxxxxxxBernoulli_trials_2 = rbinom(10, size = 1, prob=0.3)Bernoulli_trials_2
## [1] 0 1 0 0 0 0 0 0 0 0
sum(Bernoulli_trials_2)
## [1] 1Directly sampling from the binomial
xxxxxxxxxxrbinom(1, size=10, prob=0.3)
## [1] 5A discrete random variable
Models the (discrete) waiting time until an event happens. E.g. number of trials till first heads.
You and a friend want to go to a concert, but there's only one ticket left. The salesperson decides to toss a coin until heads appears. In each toss heads appears with probability
The probability it'll take
pmf, cdf, and Random generation of a binomial random variable
Warning: The definition of the geometric in R is the number of failures before the first success, i.e.
xxxxxxxxxxdgeom(x=5, prob = 0.1)## [1] 0.059049
pgeom(10, prob= 0.1)## [1] 0.6861894
rgeom(n=1, prob= 0.1)## [1] 0
rgeom(n=3, prob= 0.1)## [1] 4 3 4xxxxxxxxxxpar(mar=c(6,8,5,1))plot(0:50, dgeom(0:50, prob=0.1), col='red4', type='p', pch=16, cex=1, xlab='x', ylab='p(x)', cex.lab=2, cex.axis=2
xxxxxxxxxxpar(mar=c(6,8,5,1))plot(stepfun(0:50, c(0, pgeom(0:50, prob=0.1))), pch = 1, lwd=2, col='steelblue', xlab='x', ylab='F(x)', cex.lab=2, cex.axis=2, m
Discrete random variables:
Take finite or countable values
Completely characterized by their pmf or cdf
Bernoulli:
Binomial: number of successes in
Geometric distribution: number of trials until first success in repeated Bernoulli experiments with identical probability of success
A continuous random variable is a function
for some function

More generally, if
Properties of the pdf:
For a continuous random variable
Part I: Let
Then
for all
Part II: Let
If
Let a continuous random variable

Compute
What is the probability density function of
Compute


There is no analytical formula for the cdf but it can be numerically computed.


Let
The median of a distribution is its

pmf, cdf, random generation, and quantile of uniform random variable
xxxxxxxxxxdunif(3, min=1, max=5)## [1] 0.25
punif(3, min=1, max=5)## [1] 0.5
runif(n=3) #default is uniform[0,1]## [1] 0.007446826 0.944775617 0.292623820
qunif(0.5, min=1, max=3)## [1] 2pmf, cdf, random generation, and quantile of an exponential random variable
xxxxxxxxxxdexp(3, rate = 0.5)## [1] 0.1115651
pexp(3, rate = 2)## [1] 0.9975212
rexp(n=5) # default rate is 1## [1] 4.60679804 0.32611096 0.01889765 1.49367378 0.70605987
c(qexp(p=0.5), -log(1/2))## [1] 0.6931472 0.6931472pmf, cdf, random generation, and quantile of a normal random variable
xxxxxxxxxxdnorm(x=3, mean = 1, sd=2)## [1] 0.1209854
dnorm(x=-1, mean = -2, sd=0.5)## [1] 0.1079819
rnorm(n=5)## [1] -0.4843401 -1.0643711 1.5258292 2.2244114 0.7464652
qnorm(0.5)## [1] 0pmf, cdf, random generation, and quantile of a Pareto random variable with location
xxxxxxxxxxlibrary(EnvStats)
dpareto(x=3, location = 1, shape=2)## [1] 0.07407407
ppareto(q=3, location = 1, shape=2)## [1] 0.8888889
rpareto(n=5, location = 1, shape=2)## [1] 1.471863 1.356510 1.347556 1.011351 2.216670
qpareto(p=0.5, location = 1, shape=2)## [1] 1.414214To get to your destination you take a taxi if there is one waiting (probability 1/3) at the stand when you arrive or walk if there is no taxi waiting. A taxi takes you exactly 5 minutes. Walking to your destination takes you exactly 35 minutes. What is the cdf of the time to your destination
To get to your destination you take a taxi if one is waiting (probability 1/3) at the stand when you arrive or walk if there is no taxi waiting. Walking to your destination takes you an amount of time distributed as
To get to your destination you take a taxi if one is waiting (probability 1/3) when you arrive or walk if there is no taxi. Walking to your destination takes you exactly 35 minutes. A taxi takes an amount of time distributed as
How about the probability density function? There isn't one!
This is an example of a MIXED random variable
MIXED random variables are mixtures of discrete and continuous random variables
Discrete random variables have probability mass function but do not have probability density function
Continuous random variables have probability density function but do not have probability mass function
Mixed random variables have neither probability mass function nor probability density function
All types of random variables (discrete, continuous and mixed distributions) have cumulative distribution function!!
Continuous random variables:
Take infinite uncountable values
Have a probability density function (pdf)
Continuous RV are completely characterized by their pdf or cdf
Uniform:
Exponential:
Normal:
Pareto:
Mixtures of distributions:
Discrete + Discrete = Discrete
Continuous + Continuous = Continuous
Discrete + Continuous = Neither discrete nor continuous
For a continuous random variable, it follows from the definition of pdf and the fundamental theorem of calculus that for
But
For a continuous RV (but not for a discrete or mixed) this also equals
The expected value is a weighted average of the values a random variable takes, weighted by the probability of taking those values. It's the center of 'gravity' where the distribution 'balances'.
For a Discrete random variable
The support of a discrete random variable
Bernoulli trial,
A discrete random variable
for
Used to model number of events happening in a period of time e.g. number of mutations per unit length in a DNA strand, number of new patients (incidence rates), number of phone calls/particles arriving in a system, etc.

For a Continuous random variable
NOTE: Expectation may not exist E.g. Cauchy distribution
CDF:
PDF:
Expectation Calculation:
(Using
(integrating by parts)
For a mixed random variable
where
For a set
is called the indicator function of the set
What is the distribution of
Allows us to work with random variables (indicator functions) instead of sets and expectations instead of probabilities
Exercise: If
Example:
What is the cdf of
First,
For
Final CDF:
Continuous
Discrete
Allows us to compute the expectation of
In the previous example computing
The change-of-variable formula allows us to use a shortcut:
Only require us to integrate using the pdf of
Using Mathematica: Integrate[-Sqrt[(1+x)](2x^2-x^4), {x, 0, 1}]
Continuous
Discrete
Expectation
Continuous:
Discrete:
Mixed:
Variance
Change of variable formula/LOTUS
Continuous:
Discrete:
We are typically interested in not one, but multiple related random variables defined on the same space.
E.g. weight and height of a randomly selected person
Geometrically,
More generally, a random vector in
E.g. expression levels of
We can have discrete, continuous, and mixed random vectors
A random vector
Joint probability mass function:
In general, if
A random vector is continuous if it has a joint probability density function:
Converse is not true:
The cumulative distribution function (cdf) is defined for both discrete and continuous (and mixture) random vectors:
Just like for random variables, the pdf (continuous), the pmf (discrete), or the cdf (both), completely characterize probabilistically a random vector
For a continuous random vector:
The distribution (pdf, pmf or cdf) of the component random variables
For a continuous random vector:
For a discrete random vector:
Let
Determine:
The joint pmf of
The marginal pmf of
Suppose that the joint cumulative distribution function of

Determine the joint probability density function of
Determine the marginal cumulative distribution functions of
Determine the marginal probability density functions of
Find out whether
Determine
Suppose that the joint probability density function of
Find the constant
Determine the joint cumulative distribution functions of
Determine the marginal probability density functions of
Find out whether
Determine \text{Cov}(X,Y) and
Equivalent to the factorization of the joint cdf as a product of the marginal cdfs:
For continuous random vectors, also equivalent to the factorization of the joint pdf as a product of the marginal pdfs:
For discrete random vectors, also equivalent to the factorization of the joint pmf as a product of the marginal pmfs:
Definition of independence and factorization equivalences extend to multiple random variables
NOTE this is important and not covered in the book
If
If
If
Examples:
Let
Then
Interpretation is analog to that for random variables, 'center' of the two-dimensional distribution, center of mass if we think of probability as mass distributed on the surface of the plane
In general,
Example:
Consequences:
Expectation is linear:
Example of linearity:
Then,
For arbitrary random variables
The term:
is called the covariance of
It measures how much
Converse is not true:
Example of additivity of variance for uncorrelated RVs
Then,
Covariance is linear in each of its terms:
Cauchy-Schwartz inequality
From Cauchy-Schwartz inequality using
(
Equivalently,
Converse is not true:
Covariance it's not invariant: